Probabilistic Frequent Itemset Mining on a GPU Cluster

نویسندگان

  • Yusuke Kozawa
  • Toshiyuki Amagasa
  • Hiroyuki Kitagawa
چکیده

Probabilistic frequent itemset mining, which discovers frequent itemsets from uncertain data, has attracted much attention due to inherent uncertainty in the real world. Many algorithms have been proposed to tackle this problem, but their performance is not satisfactory because handling uncertainty incurs high processing cost. To accelerate such computation, we utilize GPUs (Graphics Processing Units). Our previous work accelerated an existing algorithm with a single GPU. In this paper, we extend the work to employ multiple GPUs. Proposed methods minimize the amount of data that need to be communicated among GPUs, and achieve load balancing as well. Based on the methods, we also present algorithms on a GPU cluster. Experiments show that the single-node methods realize near-linear speedups, and the methods on a GPU cluster of eight nodes achieve up to a 7.1 times speedup. key words: GPU, uncertain databases, probabilistic frequent itemsets

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting

Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In ...

متن کامل

Research on Classification Mining Method of Frequent Itemset

The purpose of association mining is to find the valuable relationships between data sets. The prerequisite of it is to find the frequent itemset first. In view of the existing problems in the present frequent itemset mining, this paper puts forward that data sets should be clustered first, and then the algorithm of frequent itemset mining be applied to every cluster. In this way, algorithm of ...

متن کامل

Parallelizing Frequent Itemset Mining Process using High Performance Computing

Data is growing at an enormous rate and mining this data is becoming a herculean task. Association Rule mining is one of the important algorithms used in data mining and mining frequent itemset is a crucial step in this process which consumes most of the processing time. Parallelizing the algorithm at various levels of computation will not only speed up the process but will also allow it to han...

متن کامل

A Combined Approach for Mining Fuzzy Frequent Itemset

Frequent Itemset Mining is an important approach for Market Basket Analysis. Earlier, the frequent itemsets are determined based on the customer transactions of binary data. Recently, fuzzy data are used to determine the frequent itemsets because it provides the nature of frequent itemset ie. , it describes whether the frequent itemset consists of only highly purchased items or medium purchased...

متن کامل

Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases (Technical Report)

Frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied on standard (certain) transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional techniques inapplicable. In this paper, we tackle the problem of finding pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 97-D  شماره 

صفحات  -

تاریخ انتشار 2014